An Approach Based on Semantic Relationship Embeddings for Text Classification
نویسندگان
چکیده
Semantic relationships between words provide relevant information about the whole idea in texts. Existing embedding representation models characterize each word as a vector of numbers with fixed length. These have been used tasks involving text classification, such recommendation and question–answer systems. However, embedded provided by semantic has neglected. Therefore, this paper proposes an approach that involves for which is evaluated. Three based on relations extracted from Wikipedia are presented compared existing word-based models. Our considers following relationships: synonymy, hyponymy, hyperonymy. They were considered since previous experiments shown they knowledge. The using lexical-syntactic patterns identified literature. vector: hyponymy–hyperonymy, combination all relationships. A Convolutional Neural Network relationship embeddings was trained classification. An evaluation carried out proposed configurations to compare them two corpora. results obtained metrics precision, accuracy, recall, F1-measure. best 20-Newsgroup corpus hyponymy–hyperonymy embeddings, achieving accuracy 0.79. For Reuters corpus, F1-measure recall 0.87 synonymy–hyponymy–hyperonymy.
منابع مشابه
Text Segmentation based on Semantic Word Embeddings
We explore the use of semantic word embeddings [14, 16, 12] in text segmentation algorithms, including the C99 segmentation algorithm [3, 4] and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iter...
متن کاملFrom Image to Text Classification: A Novel Approach based on Clustering Word Embeddings
In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is represented as word vector using a pre-trained word embeddings model, a k-means algorithm is applied on the word vectors in order to obtain a fixed-size set of c...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملBag-of-Embeddings for Text Classification
Words are central to text classification. It has been shown that simple Naive Bayes models with word and bigram features can give highly competitive accuracies when compared to more sophisticated models with part-of-speech, syntax and semantic features. Embeddings offer distributional features about words. We study a conceptually simple classification model by exploiting multiprototype word emb...
متن کاملAn Text Classification Approach Based on the Graph Space Model
To do the text classification on the basis of VSM, and use the maximum common subgraph to measure two graphs’ similarities are the relatively common methods, but these methods have not made full use of lots of semantic information spatial model contained, so the text classification performance is generally poor. In order to improve the classification results of the graph, on the basis of the st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics
سال: 2022
ISSN: ['2227-7390']
DOI: https://doi.org/10.3390/math10214161